How does Python Pandas send data frame to Statistica?

How does Python Pandas send data frame to Statistica?

book

Article ID: KB0076655

calendar_today

Updated On:

Products Versions
Spotfire Statistica 13.5

Description

The Python integration with Statistica enables users to write Python script in a node that can connect to other downstream nodes within the Statistica workspace. When using Python Pandas package in a Python node to pass data to Statistica, all columns will be read as Double Type by default. As such, string variables will become double with text labels.  This may lead to unexpected results when trying to import a column with mixed types of string and numbers.

Here is more information about the Double Type variable in Statistica:
The Double (Double Precision abbreviated) data type is the default format for storing numeric values in Statistica. Technically, the values are stored as 64-bit floating point real numbers, with 15-digit precision (1 bit for the sign, 11 for the exponent, and 52 for the mantissa). The range of values supported by this data type is approximately ±1.7*(10^308). Each numeric value can have a unique text label attached (see Text Labels Editor) of practically unlimited length when the Display format is General. This is the only data type that allows numbers containing decimals. When your data type is Double, each cell takes up 8 bytes of storage (plus the optional text label). Note that for the Double data type, the missing data code is -999999998.

Issue/Introduction

This article explains how variables are imported when sending data from Python Pandas to Statistica.

Resolution

The below Python node example shows how to import excel data via Python Pandas to Statistica, and route it to the workbook output.
User-added image

If any variable in the excel sheet contains mixed types of value including both strings and numbers, it might lead to an unexpected output table.
For instance, if there is a text string "a" and a numeric value 101 in the excel sheet column
User-added image

After importing the excel data to Statistica via Python Pandas,  the row # 6 with an original numeric value of 101 will be displayed as "a" in the Statistica spreadsheet. Because the column is being imported as Double type, a text string in a Double variable will be assigned numeric value starting from 101 by default. In this case, the string "a" is assigned as 101.  And when there is a coincidental number 101 existing in the column, it matches the number with the existing text label "a" to 101 mappings and displays the text label in the output.
User-added image

Workaround: Split the column with mixed types of strings and numbers into different columns where each column contains a variable of a unique type. Another alternative would be to use the import data node available in Statistica to import the data instead whenever applicable.